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DO VALUE-ADDED METHODS LEVEL THE PLAYING FIELD FOR TEACHERS? 


HIGHLIGHTS 

• Value-added measures partially level the playing field by controlling for many student 
characteristics. But if they don't fully adjust for all the factors that influence achievement and that 
consistently differ among classrooms, they may be distorted, or "confounded." 

• Simple value-added models that control for just a few tests scores (or only one score) and no 
other variables produce measures that underestimate teachers with low-achieving students and 
overestimate teachers with high-achieving students. 

• The evidence, while inconclusive, generally suggests that confounding is weak. But it would not be 
prudent to conclude that confounding is not a problem for all teachers. In particular, the evidence 
on comparing teachers across schools is limited. 

• Studies assess general patterns of confounding. They do not examine confounding for individual 
teachers, and they can't rule out the possibility that some teachers consistently teach students 
who are distinct enough to cause confounding. 

• Value-added models often control for variables such as average prior achievement for a classroom 
or school, but this practice could introduce errors into value-added estimates. 

• Confounding might lead school systems to draw erroneous conclusions about their teachers - 
conclusions that carry heavy costs to both teachers and society. 


INTRODUCTION 

Value-added models have caught the interest of policymakers because, unlike using student tests 
scores for other means of accountability, they purport to "level the playing field." That is, they 
supposedly reflect only a teacher's effectiveness, not whether she teaches high- or low-income 
students, for instance, or students in accelerated or standard classes. Yet many people are concerned 
that teacher effects from value-added measures will be sensitive to the characteristics of her students. 
More specifically, they believe that teachers of low-income, minority, or special education students 
will have lower value-added scores than equally effective teachers who are teaching students outside 
these populations. Other people worry that the opposite might be true — that some value-added 
models might cause teachers of low-income, minority, or special education students to have higher 
value-added scores than equally effective teachers who work with higher-achieving, less risky 
populations. 

In this brief, we discuss what is and is not known about how well value-added measures level the 
playing field for teachers by controlling for student characteristics. We first discuss the results of 
empirical explorations. We then address outstanding questions and the challenges to answering them 
with empirical data. Finally, we discuss the implications of these findings for teacher evaluations and 
the actions that may be based on them. 
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WHAT IS KNOWN ABOUT HOW VALUE-ADDED MODELING LEVELS THE 
PLAYING FIELD? 

Value-added modeling uses statistical methods to isolate the contributions of teachers from other 
factors that influence student achievement. It does this by using data for individual students, such as 
scores on standardized tests, special education and English-learner status, eligibility for free and 
reduced-price meals (a proxy for poverty), and race and ethnicity. It sometimes controls for the 
average classroom or school-wide scores on previous years' tests or for Census data, class size, or a 
principal's years of experience. 

Despite these controls, people still worry that value-added estimates may be distorted, or, in 
statisticians' terms, "confounded." Value-added measures are said to be confounded if they are 
subject to change because of students' socio-economic backgrounds or other student-level 
characteristics, and also if teachers who are equally effective have persistently different value-added 
scores because of the types of students they teach. For example, confounding occurs if teachers of 
low-income or minority students have lower -or higher - scores than equally effective teachers who 
teach groups that tend to be higher-achieving. In short, confounding means that we cannot determine 
the educators' contributions distinct from those of the students they teach. 

Confounding might occur because the statistical model doesn't measure or properly control for all the 
factors that contribute to student achievement. For example, it might not fully account for students 
with unique disabilities. 

For confounding to occur, these kinds of factors must consistently be associated with the students of a 
particular teacher, so that they result in a value-added score that consistently underestimates or 
overestimates her effectiveness. For example, suppose that every year a fifth grade teacher is 
assigned highly gifted students whose learning is not captured by the yearly achievement test, and 
that her value-added measure does not account for the gifted status of these students. We consider 
this teacher's value-added score to be confounded. It is persistently too low. 

On the other hand, suppose that by complete chance, a fifth grade teacher has five highly gifted 
students in her class in one year, but that she does not have similar students in other years and might 
not even have similar students in other class assignments she could have had that same year. This 
teacher's value-added score is not confounded, even though the model does not completely account 
for her students' gifted status, because it is only by chance that this teacher had such good students, 
and it was not assumed that her luck would continue. Even if this chance assignment does not result 
in confounding, it does create an error in the teacher's value-added for the year . 1 

The concern with confounding is that student characteristics will conflate measures of teacher 
effectiveness in predictable ways: teachers in high-poverty schools might consistently receive scores 
that are too low, teachers of English language-learners might consistently receive scores that are too 
high, and so on. To avoid these risks, value-added models must fully control for background variables 
that could be persistently associated with any teacher. 
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RESULTS FROM THE LITERATURE 

It is difficult to test for confounding of value-added estimates using data collected on students and 
their teachers in standard settings. We do not know how effective teachers really are, so we can't 
compare our value-added estimates to the "truth." Moreover, the only student data we have for 
teachers are from the types of students who we think might cause confounding, and the very variables 
we might manipulate to test for confounding are the same ones the value-added model uses as 
controls in the first place. 

Still, researchers have conducted some clever tests to assess the likelihood of confounding. The 
research supports one conclusion: value-added scores for teachers of low-achieving students are 
underestimated, and value-added scores of teachers of high-achieving students are overestimated by 
models that control for only a few scores (or for only one score) on previous achievement tests 
without adjusting for measurement error. 2,3 An example of such a model is a student growth 
percentile 4 model used to calculate median growth percentiles for a teacher of fourth grade 
mathematics that controls only for students' prior third grade mathematics scores. 5 

Beyond this point, the evidence is contradictory. There are studies suggesting confounding does not 
exist 6 and others suggesting it might. 7 All of the studies have limitations, and because they are so 
hard to conduct, none is definitive. The evidence may favor the conclusion that confounding generally 
is weak. But because the evidence supporting this claim comes from just a few limited studies, and 
because other studies contradict some of that evidence, it would be unwise to conclude that 
confounding is not a problem. Moreover, the studies take in broad samples of teachers; they can't 
rule out the possibility that some teachers consistently teach students who are distinct enough in 
some way to cause confounding. 


STUDIES THAT SUGGEST CONFOUNDING DOES NOT EXIST 

Two rather compelling studies find no evidence on confounding, although neither has been replicated, 
and neither addresses all sources of confounding. Clearly, value-added models do not account for 
every factor that might contribute to student learning; the question is whether they account for 
enough variables so that any factors not controlled by the model are not persistently associated with 
teachers. 8 

The first study that found no evidence of confounding looked at large samples of students and 
teachers from a single urban district over several years. 9 It merged student test scores with data from 
the tax returns of the students' families to see if income and other data from the returns 10 were 
related with value-added estimates. The study found that the data (not typically available to districts) 
were not associated with value-added estimates. 11 

The same study used several years of value-added estimates to devise another clever test of potential 
confounding. The researchers ask us to suppose that a top-performing teacher had been teaching 
grade 5 at the same school for several years. She is one of three grade 5 teachers at the school. Then 
the researchers ask us to suppose that the teacher leaves. Given that the school is unlikely to replace 
this exceptional teacher with an equally good performer, the grade 5 cohorts will average lower 
achievement in the years after she leaves because one-third of the students will now have a less 
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effective teacher than did the students in the previous cohorts. In other words, if we follow 
sequential cohorts of students from a grade level in a school, we should hypothesize to see a drop in 
achievement when a teacher with a very high value-added estimate leaves the school (or a particular 
grade), and an increase in achievement when a teacher with low value added leaves. The magnitude 
of the change will depend on the value added of the teacher who transfers. 

The study tested this hypothesis and found that these patterns held. 12 And the degree of change was 
very much in line with the predictions based on the teachers' value added. If confounding were 
strong, the value-added estimate for a really good teacher would be due to the students the teacher 
was assigned, not to the teacher herself. Consequently, the loss of a teacher with high value added 
would not affect average achievement, since removing that teacher would have no change on the 
cohorts, and since cohort-to-cohort changes in achievement would be smaller than that predicted by 
value added. The fact that the changes are consistent with value-added predictions suggests that 
confounding is limited. 13 

The second study 14 that found no evidence of confounding compared value-added estimates when 
schools followed standard practices for assigning classes to teachers with value-added estimates when 
classes were randomly assigned to teachers. It used 78 pairs of teachers; both taught the same grade 
in the same school, and both remained at that school for two consecutive years. Value added cannot 
be confounded in the year that students were randomly assigned to classes since assignments 
resulted from the luck of the draw. The study found that value-added estimates on randomly assigned 
classes were statistically equivalent to the value-added estimates from classes that were assigned 
using standard practices. This result suggests that the measure from standard practice must not be 
confounded. However, the study tests for confounding only among teachers within a school. It does 
not provide evidence about confounding by differences among students from different schools. 15 


STUDIES THAT SUGGEST CONFOUNDING EXISTS 

As noted above, one set of studies finds compelling evidence that value-added measures from certain 
very simple models typically will be confounded. Two other studies have taken different approaches, 
and both demonstrate the potential for confounding, although neither proves it exists. 

Studies that suggest that overly simple models will typically confound value-added measures do so by 
following process. First, they estimate value added from simple models, such as those that control for 
only a few prior test scores and don't make complicated adjustments for measurement error. Next, 
they estimate value added using more complex models. Finally, they show that the value added from 
simpler models is more strongly related to the background characteristics of the teachers' classes than 
the value added from more complex models. The studies argue that the relationship between value 
added and student variables can't be due to teachers since it doesn't exist for value-added estimates 
made with the complex models. It must result from the failure of simple models to fully control for 
differences among the students taught by different teachers. 16 

Another study compared estimates of the variability of value added in schools in which class 
assignments appeared to be random to that of schools in which assignments were distinctly non- 
random, that is, the distributions of student background variables were too different across 
classrooms to have occurred by chance. 17 Value-added estimates varied more among teachers in 
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schools that did not appear to assign students at random than in schools that did. This finding is 
consistent with what we would expect to see if the estimates confounded student background 
variables with teacher effectiveness. The schools with non-random differences among classes have an 
additional source of variability in their value-added estimates because they conflate student variables 
with real teacher effectiveness. The other schools do not have this other factor because the student 
background variables do not vary among classes. However, we might also observe this empirical 
finding if (1) there were no confounding and (2) schools that did not appear to use random 
assignment had staffs that were more varied in their effectiveness. The study cannot rule out this 
alternative. 

As we know, value-added estimates are based on statistical models. These models will not confound 
value added with student background variables under specified assumptions about how these 
variables relate to student achievement. If the assumptions do not hold, then confounding can occur. 

A test for confounding can then be constructed as a test of the model's assumptions. Tests of model 
assumptions are complicated, but one test 18 has an intuitive key component: it tests whether 
students' future teacher assignments are associated with students' current achievement scores given 
the control variables that are specified in the model. Since future teachers cannot directly affect 
students' current achievement, a finding that future teacher assignments are associated with those 
outcomes ("the future predicts the past") suggests a violation of the model assumptions that could 
lead to confounding. When applied to data from North Carolina, the test found strong evidence that 
students' future teacher assignments predict the students' current scores. 19 

This test has been replicated with other data, and those studies also found strong evidence that future 
teacher assignments predict current scores. The test of model assumptions is not the same as a test 
for bias. Recent theoretical findings show that if students are tracked on the basis of their current test 
scores, the test of assumptions might fail, but value added might not be confounded. Failing the test 
of assumptions says that confounding is possible, but it doesn't guarantee that it exists. 


WHAT MORE NEEDS TO BE KNOWN ON THIS ISSUE? 

Value-added models clearly go a long way toward leveling the playing field by controlling for many 
student variables that differ among classrooms. Whether value added fully levels the playing field is a 
question that can't be answered without more evidence on confounding and the conditions under 
which it is likely to occur. Given the extensive controls used in value-added models, it is possible that 
even if confounding occurs, for many teachers it could lead to errors smaller than those produced by 
other means of teacher evaluation. 20 

Most of the studies of confounding have not been replicated by other researchers in other places. 
Consider, for example, the study that shows that cohort-to-cohort changes in achievement are 
predicted by the value added of teachers who leave a school or grade. That study excluded many 
students in the district because their data were incomplete. It is not clear whether the results would 
have differed if all the students had been used. One way to alleviate concerns about the data would 
be to replicate the study. If the results of this study were replicated, they could provide fairly strong 
evidence that confounding is limited. 
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The studies described in the previous sections provide evidence about general patterns of 
confounding. They do not assess the confounding for individual teachers. Results for some teachers 
may differ from the general patterns, so districts should look carefully at the conditions of teachers 
who receive consistently low or high scores: What types of students do these teachers have? And how 
might they differ from other classes? 

More generally, states and districts could better judge the merits of value-added measures if they 
knew more about the conditions under which there was strong or weak confounding. Studies across 
states and districts with different policies and populations could help expose these conditions. For 
instance, researchers could conduct studies in districts that do and do not track students in secondary 
schools, and they could determine if confounding appears stronger or more likely when tracking 
occurs. 

Similarly, states and districts need guidance about which value-added model best mitigates 
confounding. The studies that find no evidence of confounding include classroom-level variables and 
sometimes school-level variables. The other studies do not include these variables. The relationship 
between value-added estimates with background variables is very sensitive to the inclusion of such 
aggregates , 21 and the results on confounding may be, too. Controlling for aggregate variables might 
even cause confounding for some value-added approaches . 22 On the other hand, not controlling for 
aggregate variables might also result in confounding. It would be valuable to know how much the 
difference in the models contributes to the differences in the results. An answer could come from 
replicating the studies on confounding using different models. 

The studies discussed here focus only on elementary- and middle-school students and teachers; they 
do not include high schools, where results might be different. Standardized tests in high schools are 
less directly related to the coursework of students, and end-of-course tests may be taken by very 
selective samples of students. Moreover, the previous test scores of high school students, which are 
the most important variable used in value-added models, may be only weakly related to standardized 
or end-of-course tests. This is because of the time lapse between testing 23 , lack of alignment between 
material covered by the tests 24 , or both. We need empirical studies on high schools before we can 
draw any conclusions about how level the field is for these teachers. 

An issue related to confounding is whether a teacher's effectiveness depends on the students she is 
teaching and the environment in which she teaches, and whether she would be more or less effective 
in a different context . 25 But the above studies consider confounding by student characteristics only in 
given schools and classrooms. If teacher effectiveness depends on context, this is another source of 
error that calls for further study. 


WHAT CAN'T BE RESOLVED BY EMPIRICAL EVIDENCE ON THIS ISSUE? 

The question of whether value-added measures are truly confounded probably can never be resolved 
completely with empirical evidence. Empirical studies can determine whether the measures appear 
to be confounded for teachers in the grade levels, schools, districts, and states that were studied, but 
they can't rule out different conclusions in other settings. Studies from middle schools might not apply 
to high schools, for instance, and some teachers might teach students who are so different from other 
students that value-added measures fail to account for their achievement levels. 
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It is also very challenging to differentiate between a teacher's contribution to student achievement 
and the influences of the school, the classroom, and a student's peers. The separate contributions can 
only be studied when teachers move, and even then, we must assume that the effects of these other 
influences and the effectiveness of the teacher remain constant across time. Value-added models 
often control for variables such as school-wide average scores on previous tests, and, as noted above, 
controlling for such averages could introduce errors into the value-added measure. Or it could in the 
future if schools change how teachers are assigned to classes once the value-added estimates become 
available. 

Even if we conclude that value-added measures are not confounded, we will never know the true 
effectiveness of a teacher working in all different classroom situations. 


PRACTICAL IMPLICATION 

How Does This Issue Impact District Decision Making? 

Because the evidence on confounding by student background variables is mixed, those who make 
decisions about teachers should allow for its potential to occur. Confounding can lead to serious 
errors with heavy consequences for teachers and society: ineffective teachers may be deemed 
effective and vice-versa. So, districts need accurate measures of these errors, and they must consider 
them if they plan to use value-added measures for high-stakes decisions. 

Another implication of confounding is that these errors— and the resulting conclusions about 
teachers— will be associated with different groups of students. The errors may serve to discredit 
value-added modeling, precisely because the models purportedly remove such associations. Teachers 
might even be discouraged from teaching certain groups of students if confounding results in their 
getting consistently low value-added scores. 

It is important for policymakers to remember that certain conditions, such as how teachers are 
assigned to schools and classes, can change over time and that these changes can affect confounding. 
The above studies used data that were collected before states and district reported value-added 
scores to teachers or used it for evaluations. A decision to use these scores for evaluations might in 
itself change the schooling environment in ways that could lead to different results. For example, 
teachers with higher value-added scores might transfer to certain kinds of schools, thus creating a 
new association between teacher effectiveness and student background variables that does not yet 
exist, but which could in the future. 

To reduce the risk of confounding in value-added estimates, states and districts should avoid the 
simple value-added models described above. They should control for multiple previous test scores and 
account for measurement error in those tests. They should use models advocated by studies that 
compare alternative value-added approaches , 26 and if they work with a vendor, use one that has given 
the potential for confounding careful consideration. 

If decision-makers suspect confounding, they might be wise to limit the comparisons they make with 
value-added estimates. For instance, they might want to compare teachers only to their peers in 
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similar schools, or compare only teachers within the same school. Districts might also study the 
relationship between value-added measures and student background variables. A strong relationship 
between value-added measures and background variables could indicate confounding or disparity in 
the assignment of teachers. Districts might pay particular attention to teachers of students with 
uncommon characteristics. They might monitor these teachers' value-added scores for consistent 
highs or lows and check how they relate to other measures of teaching, such as classroom 
observations. The relationship between value-added estimates and other measures should be the 
same for these teachers as it is for others. Districts could also track, over time, the average 
achievement of grade-level cohorts within schools to determine if performance changes as predicted 
by the value added by teachers who transfer into or out of schools and grades. 

Finally, districts should monitor student achievement, along with scores for teacher observations, to 
determine whether the use of value-added measures in evaluations is doing what is most important - 
improving teaching and learning. 
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